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We  describe  a  procedure  for  findin gdE/dw^  where  £  is  an  arbitrary  func¬ 
tional  of  the  temporal  trajectory  of  the  states  of  a  continuous  recurrent  network 
and  w,j  are  the  weights  of  that  network.  An  embellishment  of  this  procedure 
involving  only  computations  that  go  forward  in  time  is  also  described.  Com¬ 
puting  these  quantities  allows  one  to  perform  gradient  descent  in  the  weights  to 
minimize  £,  so  our  procedure  forms  the  kernel  of  a  new  connectionist  learning 
algorithm. 
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1  Introduction 

v 

Pineda  (2)  has  shown  how  to  train  the  fixpo.  ,-.  of  a  recurrent  temporally  continuous 
generalization  of  backpropagation  networks  [3].  Such  networks  are  governed  by  the 
coupled  differential  equations 

Trjfi  -  -yi  +  eOO  +  fi  (1) 

where 

x>  -  Yi  w)iyi 

i 

is  the  total  input  to  unit  i,  y,  is  the  state  of  unit  i,  Ti  is  the  time  constant  of  unit  t,  <r  is 
an  arbitrary  differentiable  function1,  wq  are  the  weights,  and  u.e  boundary  conditions 
y(tb)  and  driving  functions  I  are  the  input  to  the  system.  See  figure  2  for  a  graphical 
representation  of  this  equation. 

lTvpicallv  <rfO  -  •  >  *  in  ’"hich  zazz.  1  -  <7«)). 


Consider  Civ),  an  arbitrary  functional  of  the  trajectory  taken  by  y  between 
and  t; Below,  we  develop  a  technique  for  computing  3E(y)/>')w,l  and  'fi'yr  T,. 
thus  allowing  us  to  do  gradient  descent  in  the  weights  and  time  constants  so  as  to 
minimize  E.  The  computation  of  9E/3wtl  seems  to  require  a  phase  in  which  the 
network  is  run  backwards  in  ume,  but  a  tnck  for  avoiding  this  is  also  developed. 


2  The  Equations 


Let  us  define 


e,(t)  =  lim  (' 

t-Q 


6E(y) 


(2) 


<5y,[/..r  +  e] 

In  the  usual  case  where  E  is  of  the  form  E(y )  =  J'^f(yO)Odi  this  means  that 
ei(t)  =  df(y(t),  t)  /dyjj).  Intuitively,  e,(t)  measures  how  much  a  small  change  to  v,  at 
time  t  effects  E  if  everything  else  is  left  unchanged.  We  also  define 


z,(t)  = 


dE(  y(u-0) 

de 


at  £  =  0 


(3) 


where  y(u  °  is  the  same  as  y  except  that  d’yjdi  has  a  Dirac  delta  function  of  magnitude 
4  added  to  it  at  time  t.  Intuitively,  z,(f)  measures  how  much  a  small  change  to  y, 
at  time  t  effects  E  when  the  change  to  y,  is  propagated  forward  through  time  and 
influences  the  remainder  of  the  trajectory. 


Figure  1:  The  infinitesimal  changes  to  y  considered  in  et(0  (left)  and  zi  (0  (right). 


We  can  approximate  (1)  with  the  difference  equation 

y,(r  w  At)  ^  y,(t)  +  21^(0 
or 

y,(f  +  Jf)  =  ^ 1  -  Y )  MO  +  +  yUO  (4) 

which  is  exact  in  the  limit  as  At  —  0. 

2 For  instance  F.  =  f'1  metres  die  deviation  of  >’o  from  ifce  fun* ion/,  a-  -  mncv.izing 

y<<J 

ihis  £  would  leach  the  network  to  have  yq  imitate/. 
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Figure  2:  A  lattice  representation  of  (4). 


Consider  incrementing  y,(r)  by  t  and  letting  this  change  propagate  forward.  The 
differential  of  E(y )  w.r.t.  e  is  thus  the  sum  of  the  differentials  of  E(y)  w.r.L  the  other 
values  that  y,(0  influences,  weighted  by  the  magnitude  of  its  influence.  By  examining 
all  the  outgoing  lines  from  node  y,(t)  in  figure  2  we  are  led  to  a  difference  equation 
for  z,( 0. 

z,(f)  ss  ^  1  -  y')  2‘(‘ +  A‘) +  Ate‘(0  +  XI  +  At)'  (5) 

where  the  (1  -  At/T,)z,(t)  term  is  due  to  the  linear  influence  >,(/)  has  upon  y,(t+At), 
the  £ .  term  is  due  to  the  effect  that  changing  y,(t)  has  upon  the  other  yj(t+At)  through 
their  nonlinear  coupling,  and  the  Atei{t)  term  is  due  to  the  effect  that  changing  y, 
between  times  t  and  t  +  At  has  directly  upon  E.  By  rewriting  (5)  as 


assuming  this  to  be  of  the  form  z,(t)  =  z, (r  +  At)  -  Atdzi/di  ( t  +  zlf).  and  taking  the 
limit  as  zlf  —  0  we  obtain  a  differential  equation, 


Let 


Vy(t)  = 


3E( y(*4,f,0) 

H 


at  (  =  0 


(7) 


where  fM.to  is  fire  same  as  y  except  that  is  increased  by  £  from  t  through  q. 
Again  examining  figure  2,  we  see  that  the  appropriate  difference  equation  for  v  is 


v,j(  t)  =  Vi,(l  +  At)  +  Atyi(ty(xJ(t))—zJ(t+  At) 

li 


which  leads  to  the  differential  equation 


di ,, 
dt 


1 

—  v,<TU.)zy 
1 1 


which  we  can  integrate  from  r0  to  t-..  By  subsututing  v,/j< )  =  0  and  t  v f r0 )  =  8E/8wv 
into  the  resulting  equation  we  eliminate  t  and  end  up  with 


dE 

dWi, 


i  f' 

XT  /  (x^Zjdt. 

‘  1  J  h 


(8) 


If  we  substitute  p,  =  Tt  1  into  (4),  find  dE/dp,  by  proceeding  analogously,  and 
substitute  Tt  back  in  we  get 


dE 

dTi 


We  will  find  a  way  to  compute  dz,(t[)/dzj(to)  useful.  Let  us  define 


(9) 


Cy(0  = 


dz,(t) 

dzj(to) 


(10) 


and  take  the  partial  of  (6)  with  respect  to  z/t0),  substituting  in  Q  where  appropriate. 
This  results  in  a  differential  equation  for  Cy, 


<Kii 

dt 


f  ~  Yi  f-wd,frXxk)Oti- 


(11) 


and  for  boundary  conditions  we  note  that 


Cv(fo)  = 


1  if  i  =  j 
0  otherwise. 


(12) 


One  can  also  derive  (6),  (8)  and  (9)  using  the  calculus  of  variations  and  Lagrange 
multipliers  (Dr.  William  Skaggs,  personal  communication). 


3  Utilization 

The  most  straightforward  way  to  use  (6),  (8)  and  (9)  is  to  simulate  the  system  y 
forward  from  to  to  t\ ,  set  the  boundary  conditions  z,(f j )  =  0,  and  simulate  the  system 
z  backwards  from  ti  to  to  while  numerically  integrating  Zjcr'(xj)y,  and  z,  dyj dt  thus 
computing  <9£/<9wv  and  dE/8Tt.  Aside  from  the  practical  problems  of  simulating 
the  system  backwards  in  an  actual  learning  application,  the  backwards  simulation 
of  z  as  well  as  the  integrals  being  computed  require  that  y  also  be  run  backwards, 
necessitating  either  remembering  the  trajectory  of  y,  which  can  require  prohibitive 


amounts  of  storage,  or  the  backwards  simulation  of  y  itself,  which  is  typically  ill 
conditioned.  7 

However,  running  the  system  backwards  can  be  avoided.  Given  guesses  for  the 
correct  values  of  z,(to),  one  can  simulate  y,  z  and  £  forward  from  r0  to  and  then 
update  the  guesses  in  order  to  minimize  B  where 

i)2  (13) 


by  making  use  of  the  fact  that 


dB 

dzj(‘o) 


=  H  z.('i)C/0i) 


(14) 


For  notational  convenience,  let  t>,  =  dB/dz,( to).  We  can  use  a  Newton-Raphson 
method  with  the  appropriate  modification  for  the  fact  that  B  has  a  minimum  of  zero, 
resulting  in  the  simple  update  rule 

B 

Z.Oo)  —  Z.(fo)  -  2ijbjj2&'-  (15) 

During  our  simulation  we  can  accumulate  the  appropriate  integrals,  so  if  our  guesses 
for  2, (t0)  were  nearly  correct  we  wdl  have  computed  nearly  correct  values  for  dE/d  w# 
and  BE/dT,.  If  the  wv  change  slowly  the  correct  values  for  z,(t0 )  will  change  slowly, 
so  tolerable  accuracy  can  be  obtained  by  using  the  dE/Jwtj  computed  from  the 
slightly  incorrect  values  for  z,(t0)  while  simultaneously  updating  the  z,(r0)  for  future 
use,  eliminating  the  need  for  an  inner  loop  which  iterates  to  find  the  correct  values 
for  the  z,(to).  This  argument  assumes  that  the  quadratic  convergence  of  the  Newton- 
Raphson  method  dominates  the  linear  divergence  of  the  changes  to  the  wy,  which 
can  be  guaranteed  by  choosing  suitably  low  learning  parameters. 


4  Future  Work 


We  are  planning  on  performing  the  following  experiments  in  the  immediate  future: 

•  Leant  a  simple  xor  problem,  with  the  functional  requiring  the  output  to  be 
correct  after  2  time  units. 


•  Follow  a  square  trajectory  in  state  space,  where  the  desired  trajectories  of  two 
visible  units  are  specified  explicitly  using  a  func  inal  of  the  form 


s,(y,  -  d,)2dt 


(16) 


where  d,  is  the  desired  trajectory  for  y,  and  s,  is  the  importance  of  y,  attaining 
d,  at  time  t.  For  this  functional,  the  instantaneous  error  takes  on  the  particularly 
simple  form  e,  =  s,(y,  -  di).  Note  that  following  a  square  trajectory  requires 
the  use  of  hidden  units. 
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•  Teach  two  visible  units  to  follow  a  circular  trajectory  in  state  space,  but  rather 
than  specifying  the  trajectory  explicitly,  require  that  the  trajectory  be  on  the  cir¬ 
cle  with  center  ic:  c2)  and  radius  r  and  that  the  velocity  be  v  using  a  funcuonal 
like 


E  =  I  Uy. -Cl)2  +  (V'2_c2)2  -  r2)2  +  (_v:2+>;:  -  v:);  dt  (17) 

'  *0 

Assuming  that  these  simulations  are  successful,  we  are  planning  on  using  this 
procedure  in  the  domain  of  control  as  part  of  the  author’s  thesis  work  on  learning  to 
control  robot  manipulators  using  connectionist  networks  [1]. 
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